Notebooks Bowles
Aus praktischen Gründen wurden die Notebooks der Veranstaltung DSCI einzelnen Bowles-Kapiteln zugeordnet. Die folgende Übersicht zeigt, welche Bowles-Notebooks in welche Notebooks aus DSCI eingeflossen sind.
dsci | bowles code Python 2.7 | ggf. Abbildungen etc. |
(ipynb)dsci_intro_1 | Ohne Bowles, nur Python, in der Intro-Veranstaltung | |
(ipynb)dsci_intro_2 | Unterschiede Liste, np.ndarray, pd.DataFrame, pd.Series | |
(ipynb)Bowles_2.1_titanic | ||
(ipynb)Bowles_2.2_rocks | Listing 2-1: Sizing Up a New Data Set—rockVmineSummaries.py (Output: outputRocksVMinesSummaries.txt) | |
Listing 2-2: Determining the Nature of Attributes—rockVmineContents.py (Output: outputRocksVMinesContents.txt) | ||
Listing 2-3: Summary Statistics for Numeric and Categorical Attributes—rVMSummaryStats.py (Output: outputSummaryStats.txt) | ||
Listing 2-4: Quantile-Quantile Plot for 4th Rocks versus Mines Attribute— qqplotAttribute.py | ||
Listing 2-5: Using Python Pandas to Read and Summarize Data—pandasReadSummarize.py | ||
(ipynb)Bowles_2.3_rocks | Listing 2-6: Parallel Coordinates Graph for Real Attribute Visualization—linePlots.py | |
Listing 2-7: Cross Plotting Pairs of Attributes—corrPlot.py | Figure 2-4: Cross-plot of rocks versus mines attributes 2 and 3 | |
Listing 2-8: Correlation between Classifi cation Target and Real Attributes—targetCorr.py | ||
Listing 2-9: Pearson’s Correlation Calculation for Attributes 2 versus 3 and 2 versus 21 - corrCalc.py | Figure 2-5: Cross- plot of rocks versus mines attributes 2 and 21 | |
Listing 2-10: Presenting Attribute Correlations Visually—sampleCorrHeatMap.py | ||
(ipynb)Bowles_2.4_abalone | Listing 2-11: Read and Summarize the Abalone Data Set—abaloneSummary.py | |
Listing 2-12: Parallel Coordinate Plot for Abalone Data—abaloneParallelPlot.py | Equation 2-5: Using logit transform for soft range compression | |
Listing 2-13: Correlation Calculations for Abalone Data—abaloneCorrHeat.py | ||
(ipynb)Bowles_2.5_wine | Listing 2-14: Wine Data Summary—wineSummary.py | |
Listing 2-15: Producing a Parallel Coordinate Plot for Wine Data—wineParallelPlot.py | Figure 2-19: Correlation heat map for the wine data | |
(ipynb)Bowles_2.6_glass | Listing 2-16: Summary of Glass Data Set—glassSummary.py | Figure 2-20: Box plot of the glass data |
Listing 2-17: Parallel Coordinate Plot for the Glass Data | Figure 2-21: Parallel coordinate plot for the glass data | |
(ipynb)Bowles_3.3_rocks | Listing 3-1: Comparison of MSE, MAE and RMSE—regressionErrorMeasures.py | Figure 3-9: Confusion matrix example |
Listing 3-2: Measuring Performance for Classifier Trained on Rocks-Versus-Mines— classifierPerformance_RocksVMines.py | Table 3-2: Dependence of Misclassification Error on Decision Threshold | |
Table 3-3: Cost of Mistakes for Different Decision Thresholds | ||
Figure 3-10: In-sample ROC for rocks-versus-mines classifier | ||
Figure 3-11: Out-of-sample ROC for rocks-versus-mines classifier | ||
(ipynb)Bowles_3.4_wine_rocks | Listing 3-3: Forward Stepwise Regression: Wine Quality Data—fwdStepwiseWine.py | Figure 3-13: Wine quality prediction error using forward stepwise regression |
Listing 3-4: Forward Stepwise Regression Output—fwdStepwiseWineOutput.txt | ||
Figure 3-14: Actual taste scores versus predictions generated with forward stepwise regression | ||
Figure 3-15: Histogram of wine taste prediction error with forward stepwise regression | ||
Listing 3-5: Predicting Wine Taste with Ridge Regression—ridgeWine.py | Figure 3-16: Wine quality prediction error using ridge regression | |
Listing 3-6: Ridge Regression Output—ridgeWineOutput.txt | ||
Figure 3-17: Actual taste scores versus predictions generated with ridge regression | ||
Figure 3-18: Histogram of wine taste prediction error with ridge regression | ||
Listing 3-7: Rocks Versus Mines Using Ridge Regression—classifierRidgeRocksVMines.py | Listing 3-8: Output from Classification Model for Rocks Versus Mines Using Ridge Regression—classifierRidgeRocksVMinesOutput.txt | |
Figure 3-19: AUC for the rocks-versus-mines classifier using ridge regression | ||
Figure 3-20: Plot of actual versus prediction for the rocks-versus-mines classifier using ridge regression | ||
(ipynb)Bowles_4.3_wine | Listing 4-1: LARS Algorithm for Predicting Wine Taste—larsWine2.py | Figure 4-3: Coefficient curves for LARS regression on wine data. |
Listing 4-2: 10-Fold Cross-Validation to Determine Best Set of Coefficients—larsWineCV.py | Figure 4-4: Cross-validated mean square error for LARS on wine data. | |
Listing 4-3: Glmnet Algorithm—glmnetWine.py | Figure 4-6: Coefficient curves for glmnet models for predicting wine taste | |
(ipynb)Bowles_4.4_rocks_wine_abalone | Listing 4-4: Converting a Classifi cation Problem to an Ordinary Regression Problem by Assigning Numeric Values to Binary Labels | Figure 4-7: Coefficient curves for rocks versus mines classification problem solved by converting to labels |
Listing 4-5: Basis Expansion for Wine Taste Prediction | Figure 4-8: Functions generated to expand wine attribute session | |
Listing 4-6: Coding Categorical Variable for Penalized Linear Regression - Abalone Data—larsAbalone.py | ||
(ipynb)Bowles_5.2_wine | Listing 5-1: Using Cross-Validation to Estimate Out-of-Sample Error with Lasso Modeling Wine Taste—wineLassoCV.py | Figure 5-1: ... un-normalized Y |
Figure 5-2: ... normalized Y | ||
Figure 5-3: ... un-normalized X and Y | ||
Listing 5-2: Lasso Training on Full Data Set—wineLassoCoefCurves.py | Figure 5-4: Coefficient curves for Lasso trained to predict wine quality | |
Figure 5-5: Coefficient curves for Lasso trained on un-normalized Xs | ||
Listing 5-3: Using Out-of-Sample Error to Evaluate New Attributes for Predicting Wine Quality—wineExpandedLassoCV.py | Figure 5-6: Cross-validation error curves for Lasso trained on wine quality data with expanded feature set | |
(ipynb)Bowles_5.3_rocks | Listing 5-4: Using ElasticNet Regression to Build a Binary (Two-Class) Classifier— rocksVMinesENetRegCV.py | Figure 5-7: Out-of-sample classifier misclassification performance |
Figure 5-8: Out-of-sample classifier AUC performance | ||
Figure 5-9: Receiver operating characteristic for best performing classifier | ||
(ipynb)Bowles_5.4_rocks | Listing 5-5: Coefficient Trajectories for ElasticNet Trained on Rocks versus Mines Data— rocksVMinesCoefCurves.py | Figure 5-10: Coefficient curves for ElasticNet trained on rocks versus mines data |
Listing 5-6: Penalized Logistic Regression Trained on Rocks versus Mines Data— rocksVMinesGlmnet.py | Figure 5-11: Coefficient curves for ElasticNet penalized logistic regression trained on rocks versus mines data | |
(ipynb)Bowles_5.5_glass | Listing 5-7: Multiclass Classification with Penalized Linear Regression - Classifying Crime Scene Glass Samples—glassENetRegCV.py | Figure 5-12: Misclassification error rates using penalized linear regression for glass classification |